Basic aspects of dot products for artificial neural networks:

Algebraic
Geometric
Simultaneous equations
Variance 
Linear combination of random variables
Central limit theorem  
Random projection
Efficient - FFT, Walsh Hadamard transform (WHT)
Interaction with nonlinearity
Switched - ReLU literal switch
Angle between vector and binarized bipolar (+1,-1) version of the same vector 
Affine

*ReLU is a literal switch*
An electrical switch is n volts in, n volts out when on. Zero volts out when off.

The weighted sum (dot product) of a number of weighted sums is still a linear system.

For a particular input to a ReLU neural network all the switches are decidedly in either the on or off state. A particular linear projection is in effect between the input and output.

For a particular input and a particular output neuron there is a particular composition of weighted sums that may be condensed down into a single equivalent weighted sum of the input.

For a particular input and a particular ReLU during forward propagation it is fed a particular composition of weighted sums that it makes a decision on that may be condensed down into a single equivalent weighted sum of the input.

You can look at that to see what the network is looking at in the input or calculate some metrics like the angle between the input and the weight vector of the equivalent weighted sum.
If the angle is near 90 degrees and the output of the neuron is large then the vector length of weight vector must be large. That makes the output very sensitive to noise in the inputs. If the angle is near zero then there are averaging and central limit theorem effects that provide some error correction.

Since ReLU switches at zero there are no sudden discontinuities in the output of an ReLU neural network for gradual change in the input. It is a seamless system of switched linear projections.

There are efficient algorithms for calculating certain dot products like the FFT or WHT.
There is no reason you cannot incorporate those directly into ReLU neural networks since they are fully compatible.


*Associative memory*
You can linearly scramble a vector with random projection.  That is often used for 'fair' dimension reduction.
You can also expand the dimension by taking multiple different vector to vector random projections of the input data,  You get multiple different windows on the input data.
Of course being linear there is a problem with alikeness, linear separation.  However applying a non-linear function to all the elements of the random projection(s) creates orthogonality in high dimension space.  
Since the data has been scrambled and made orthogonal a simple weighted sum (dot product) makes an effective associative memory.  Not forgetting about the list of absolute basics- the statistical properties of the dot product.

*Efficiency*
Random sign flipping followed by the WHT is a fast way to get random projections. Ie. HD where D is a diagonal matrix of random +1,-1. H=WHT matrix/algorithm.
You can use the WHT as a replacement for all the weighted sums in a neural network layer.  Then the weights of each neuron are fixed to some WHT 'sequency' pattern.  Something must be made adjustable.  You can individually parameterize the non-linear functions.  Say f(x)=a.x x>=0, f(x)=b.x x<0.
